Goto

Collaborating Authors

 splitting criteria


Linear TreeShap Peng Yu

Neural Information Processing Systems

Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required. This paper presents a more efficient and straightforward algorithm: Linear TreeShap. Like TreeShap, Linear TreeShap is exact and requires the same amount of memory.



Learning to act: a Reinforcement Learning approach to recommend the best next activities

Branchi, Stefano, Di Francescomarino, Chiara, Ghidini, Chiara, Massimo, David, Ricci, Francesco, Ronzani, Massimiliano

arXiv.org Artificial Intelligence

The rise of process data availability has recently led to the development of data-driven learning approaches. However, most of these approaches restrict the use of the learned model to predict the future of ongoing process executions. The goal of this paper is moving a step forward and leveraging available data to learning to act, by supporting users with recommendations derived from an optimal strategy (measure of performance). We take the optimization perspective of one process actor and we recommend the best activities to execute next, in response to what happens in a complex external environment, where there is no control on exogenous factors. To this aim, we investigate an approach that learns, by means of Reinforcement Learning, the optimal policy from the observation of past executions and recommends the best activities to carry on for optimizing a Key Performance Indicator of interest. The validity of the approach is demonstrated on two scenarios taken from real-life data.


FairUDT: Fairness-aware Uplift Decision Trees

Zahid, Anam, Ali, Abdur Rehman, Raza, Shaina, Shahnawaz, Rai, Kamiran, Faisal, Karim, Asim

arXiv.org Machine Learning

Training data used for developing machine learning classifiers can exhibit biases against specific protected attributes. Such biases typically originate from historical discrimination or certain underlying patterns that disproportionately under-represent minority groups, such as those identified by their gender, religion, or race. In this paper, we propose a novel approach, FairUDT, a fairness-aware Uplift-based Decision Tree for discrimination identification. FairUDT demonstrates how the integration of uplift modeling with decision trees can be adapted to include fair splitting criteria. Additionally, we introduce a modified leaf relabeling approach for removing discrimination. We divide our dataset into favored and deprived groups based on a binary sensitive attribute, with the favored dataset serving as the treatment group and the deprived dataset as the control group. By applying FairUDT and our leaf relabeling approach to preprocess three benchmark datasets, we achieve an acceptable accuracy-discrimination tradeoff. We also show that FairUDT is inherently interpretable and can be utilized in discrimination detection tasks. The code for this project is available https://github.com/ara-25/FairUDT


Splitting criteria for ordinal decision trees: an experimental study

Ayllón-Gavilán, Rafael, Martínez-Estudillo, Francisco José, Guijo-Rubio, David, Hervás-Martínez, César, Gutiérrez, Pedro Antonio

arXiv.org Artificial Intelligence

Ordinal Classification (OC) is a machine learning field that addresses classification tasks where the labels exhibit a natural order. Unlike nominal classification, which treats all classes as equally distinct, OC takes the ordinal relationship into account, producing more accurate and relevant results. This is particularly critical in applications where the magnitude of classification errors has implications. Despite this, OC problems are often tackled using nominal methods, leading to suboptimal solutions. Although decision trees are one of the most popular classification approaches, ordinal tree-based approaches have received less attention when compared to other classifiers. This work conducts an experimental study of tree-based methodologies specifically designed to capture ordinal relationships. A comprehensive survey of ordinal splitting criteria is provided, standardising the notations used in the literature for clarity. Three ordinal splitting criteria, Ordinal Gini (OGini), Weighted Information Gain (WIG), and Ranking Impurity (RI), are compared to the nominal counterparts of the first two (Gini and information gain), by incorporating them into a decision tree classifier. An extensive repository considering 45 publicly available OC datasets is presented, supporting the first experimental comparison of ordinal and nominal splitting criteria using well-known OC evaluation metrics. Statistical analysis of the results highlights OGini as the most effective ordinal splitting criterion to date. Source code, datasets, and results are made available to the research community.


Efficient Decision Trees for Tensor Regressions

Luo, Hengrui, Horiguchi, Akira, Ma, Li

arXiv.org Machine Learning

In recent years, the intersection of tensor data analysis and non-parametric modeling (Guhaniyogi et al., 2017; Papadogeorgou et al., 2021; Wang and Xu, 2024) has garnered considerable interest among mathematicians and statisticians. Non-parametric tensor models have the potential to handle complex multi-dimensional data (Bi et al., 2021) and represent spatial correlation between entries of data. This paper addresses both scalar-on-tensor (i.e., to predict a scalar response based on a tensor input) and tensor-on-tensor (i.e., both the input and output are tensors) non-linear regression problems using recursive partitioning methods, often referred to as tree(-based) models. Supervised learning on tensor data, such as tensor regression, has significant relevance due to the proliferation of multi-dimensional data in modern applications. Tensor data naturally arises in various fields such as imaging (Wang and Xu, 2024), neuroscience (Li et al., 2018), and computer vision (Luo and Ma, 2023), where observations often take the form of multi-way arrays. Traditional regression models typically handle vector inputs and outputs, and thus can fail to capture the structural information embedded within tensor data.


Learning accurate and interpretable decision trees

Balcan, Maria-Florina, Sharma, Dravyansh

arXiv.org Artificial Intelligence

Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop approaches to design decision tree learning algorithms given repeated access to data from the same domain. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We further consider the problem of tuning hyperparameters in pruning the decision tree for classical pruning algorithms including min-cost complexity pruning. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees. Finally, we demonstrate the significance of our approach on real world datasets by learning data-specific decision trees which are simultaneously more accurate and interpretable.


Building Trees for Probabilistic Prediction via Scoring Rules

Shashaani, Sara, Surer, Ozge, Plumlee, Matthew, Guikema, Seth

arXiv.org Machine Learning

Decision trees built with data remain in widespread use for nonparametric prediction. Predicting probability distributions is preferred over point predictions when uncertainty plays a prominent role in analysis and decision-making. We study modifying a tree to produce nonparametric predictive distributions. We find the standard method for building trees may not result in good predictive distributions and propose changing the splitting criteria for trees to one based on proper scoring rules. Analysis of both simulated data and several real datasets demonstrates that using these new splitting criteria results in trees with improved predictive properties considering the entire predictive distribution.


Era Splitting -- Invariant Learning for Decision Trees

DeLise, Timothy

arXiv.org Artificial Intelligence

Real-life machine learning problems exhibit distributional shifts in the data from one time to another or from on place to another. This behavior is beyond the scope of the traditional empirical risk minimization paradigm, which assumes i.i.d. distribution of data over time and across locations. The emerging field of out-of-distribution (OOD) generalization addresses this reality with new theory and algorithms which incorporate environmental, or era-wise information into the algorithms. So far, most research has been focused on linear models and/or neural networks. In this research we develop two new splitting criteria for decision trees, which allow us to apply ideas from OOD generalization research to decision tree models, including random forest and gradient-boosting decision trees. The new splitting criteria use era-wise information associated with each data point to allow tree-based models to find split points that are optimal across all disjoint eras in the data, instead of optimal over the entire data set pooled together, which is the default setting. In this paper we describe the problem setup in the context of financial markets. We describe the new splitting criteria in detail and develop unique experiments to showcase the benefits of these new criteria, which improve metrics in our experiments out-of-sample. The new criteria are incorporated into the a state-of-the-art gradient boosted decision tree model in the Scikit-Learn code base, which is made freely available.


Causal Inference Based Single-branch Ensemble Trees For Uplift Modeling

Zheng, Fanglan, Wang, Menghan, Li, Kun, Tian, Jiang, Xiang, Xiaojia

arXiv.org Artificial Intelligence

In this manuscript (ms), we propose causal inference based single-branch ensemble trees for uplift modeling, namely CIET. Different from standard classification methods for predictive probability modeling, CIET aims to achieve the change in the predictive probability of outcome caused by an action or a treatment. According to our CIET, two partition criteria are specifically designed to maximize the difference in outcome distribution between the treatment and control groups. Next, a novel single-branch tree is built by taking a top-down node partition approach, and the remaining samples are censored since they are not covered by the upper node partition logic. Repeating the tree-building process on the censored data, single-branch ensemble trees with a set of inference rules are thus formed. Moreover, CIET is experimentally demonstrated to outperform previous approaches for uplift modeling in terms of both area under uplift curve (AUUC) and Qini coefficient significantly. At present, CIET has already been applied to online personal loans in a national financial holdings group in China. CIET will also be of use to analysts applying machine learning techniques to causal inference in broader business domains such as web advertising, medicine and economics.